|
Statistical methods have been used in comparative linguistics since at least the 1950s (see Swadesh list). Since about the year 2000, there has been a renewed interest in the topic, based on the application of methods of computational phylogenetics and cladistics to define an optimal tree (or network) to represent a hypothesis about the evolutionary ancestry and perhaps its language contacts. The probability of relatedness of languages can be quantified and sometimes the proto-languages can be approximately dated. The topic came the attention of the popular press in 2003 after the publication of a short study on Indo-European in ''Nature'' (Gray and Atkinson 2003). A volume of articles on ''Phylogenetic Methods and the Prehistory of Languages '' was published in 2006 as the result of a conference held in Cambridge in 2004. A goal of comparative historical linguistics is to identify instances of genetic relatedness amongst languages.〔Harrison, On the limits of the comparative method, in Joseph and Janda, The Handbook of Historical Linguistics (2003)〕 The steps in quantitative analysis are (i) to devise a procedure based on theoretical grounds, on a particular model or on past experience, etc. (ii) to verify the procedure by applying it to some data where there exists a large body of linguistic opinion for comparison (this may lead to a revision of the procedure of stage (i) or at the extreme of its total abandonment) (iii) to apply the procedure to data where linguistic opinions have not yet been produced, have not yet been firmly established or perhaps are even in conflict.〔Embleton, Statistics in Historical Linguistics, 1986〕 Applying phylogenetic methods to languages is a multi-stage process (a) the encoding stage - getting from real languages to some expression of the relationships between them in the form of numerical or state data, so that those data can then be used as input to phylogenetic methods (b) the representation stage - applying phylogenetic methods to extract from those numerical and/or state data a signal that is converted into some useful form of representation, usually two dimensional graphical ones such as trees or networks, which synthesise and "collapse" what are often highly complex multi dimensional relationships in the signal (c) the interpretation stage - assessing those tree and network representations to extract from them what they actually mean for real languages and their relationships through time.〔Heggarty "Interdiscipline Indiscipline" in Phylogenetic Methods and the prehistory of Languages - Forster and Renfrew (2006)〕 ==Background== The standard method for assessing language relationships has been the comparative method. However this has a number of limitations. Not all linguistic material is suitable as input and there are issues of the linguistic levels on which the method operates. The reconstructed languages are idealized and different scholars can produce different results. Language family trees are often used in conjunction with the method and "borrowings" must be excluded from the data, which is difficult when borrowing is within a family. It is often claimed that the method is limited in the time depth over which it can operate. The method is difficult to apply and there is no independent test.〔McMahon and McMahon, Language Classification by Numbers, 2003〕 Thus alternative methods have been sought that have a formalised method, quantify the relationships and can be tested. Probably the first published quantitative historical linguistics study was by Sapir in 1916,〔Time perspective in aboriginal American culture, Memoir 10, Anthropological Series 13, Ottawa〕 while Kroeber and Chretien in 1937 〔Quantitative classification of Indo-European languages, Language 13〕 investigated nine Indo-European (IE) languages using 74 morphological and phonological features (extended in 1939 by the inclusion of Hittite). Ross 〔Philological probability problems, Journal of Royal Statistical Society Series B, 12〕 in 1950 carried out an investigation into the theoretical basis for such studies. Swadesh, using word lists, developed lexicostatistics and glottochronology in a series of papers 〔For example, Lexico-statistical dating of prehistoric ethnic contacts, Proceedings of the American Philosophical Society, 6 (1952)〕 published in the early 1950s but these methods were widely criticised 〔For example, by Bergsland and Vogt, On the validity of glottochronology, Current Anthropology 3 (1962)〕 though some of the criticisms were seen as unjustified by other scholars. Embleton published a book on "Statistics in Historical Linguistics" in 1986 which reviewed previous work and extended the glottochronological method. Dyen, Kruskal and Black carried out a study of the lexicostatistical method on a large IE database in 1992.〔An Indoeuropean classification: a lexicostatistical experiment, Transactions of the American Philosophical Society 82/5〕 In the mid-1990s a group at Pennsylvania University computerised the comparative method and used a different IE database with 20 ancient languages.〔Ringe, Warnow and Taylor, Indo-European and Computational Cladistics, Transactions of the Philological Society Volume 100 (2003)〕 In the biological field several software programs were then developed which could have application to historical linguistics. In particular a group at the University of Auckland developed a method that gave controversially old dates for IE languages.〔Initially announced in Gray and Atkinson, Language-tree divergence times support the Anatolian theory of Indo-European origin, Nature 426, 27 November 2003〕 A conference on "Time-depth in Historical Linguistics" was held in August 1999 at which many applications of quantitative methods were discussed.〔Published by Renfrew, McMahon and Trask in 2000〕 Subsequently many papers have been published on studies of various language groups as well as comparisons of the methods. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Quantitative comparative linguistics」の詳細全文を読む スポンサード リンク
|